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Abstract. Bifurcating autoregressive processes, which can be seen as an adaptation of au¬ 
toregressive processes for a binary tree structure, have been extensively studied during the last 
decade in a parametric context. In this work we do not specify any a priori form for the two 
autoregressive functions and we use nonparametric techniques. We investigate both nonasymp- 
totic and asymptotic behaviour of the Nadaraya-Watson type estimators of the autoregressive 
functions. We build our estimators observing the process on a finite subtree denoted by T^, 
up to the depth n. Estimators achieve the classical rate in quadratic loss over 

Holder classes of smoothness. We prove almost sure convergence, asymptotic normality giving 
the bias expression when choosing the optimal bandwidth. Finally, we address the question of 
asymmetry: we develop an asymptotic test for the equality of the two autoregressive functions 
which we implement both on simulated and real data. 
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1. Introduction 

1.1. A generalisation of the bifurcating autoregressive model. 

BAR process. Roughly speaking, bifurcating autoregressive processes, BAR processes for short, 
are an adaptation of autoregressive processes when the index set have a binary tree structure. 
BAR processes were introduced by Gowan and Staudte [16] in 1986 in order to study cell division 
in Escherichia coli bacteria. For m > 0, let = {0,1}"* (with Gg = {0}) and introduce the 
infinite genealogical tree 

OO 

T = IJ G™. 

m—0 

For u G Gm, set |m| = m and define the concatenation uO = (u, 0) € Gm-i-i and ul = (tt, 1) S Gm-i-i- 
In [16], the original BAR process is defined as follows. A cell m £ T with generation time A„ gives 
rise to the offspring (uO,ul) with generation times 

{ ^uo = a + bXu + Suo, 

Xui = a + bXu + Sul, 

where a and b are unknown real parameters, with j5j < 1 which measures heredity in the transmis¬ 
sion of the biological feature. The noise sequence ((fftiO, forms a sequence of independent 

and identically distributed bivariate centered Gaussian random variables and represents environ¬ 
mental effects; the initial value X^ is drawn according to a Gaussian law. 
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Since then, several extensions of this model have been studied and various estimators for un¬ 
known parameters have been proposed. First, one can mention [27] where Guyon introduces asym¬ 
metry to take into account the fact that autoregressive parameters for type 0 or type 1 cells can 
differ. Introducing the bifurcating Markov chain theory, Guyon studies the asymptotic behaviour 
of the least squares estimators of the unknown parameters. He also introduces some asymptotic 
tests which allow to decide if the model is symmetric or not. Several extensions of this linear 
model have been proposed and studied from a parametric point of view, see for instance Basawa 
and Huggins [2, 3] and Basawa and Zhou [4, 5] where the BAR process is studied for non-Gaussian 
noise and long memory. Around 2010, Bercu, Blandin, Delmas, de Saporta, Gegout-Petit and 
Marsalle extended in different directions the study of the BAR process. Bercu et al. [7] use mar¬ 
tingale approach in order to study least squares estimators of unknown parameters for processes 
with memory greater than 1 and without normality assumption on the noise sequence. Even more 
recently, [20, 21] takes into account missing data and [ 6 , 14, 22] study the model with random 
coefficients. A number of other extensions were also surveyed, one can cite Delmas and Marsalle 
[17] for a generalisation to Galton-Watson trees and Bitseki Penda and Djellout [10] for deviations 
inequalities and moderate deviations. 

Nonlinear BAR process. Nonlinear bifurcating autoregressive (NBAR, for short) processes gener¬ 
alize BAR processes, avoiding an a priori linear specification on the two autoregressive functions. 
Let us introduce precisely a NBAR process which is specified by 1) a filtered probability space 
(J^m)m> 0 ) IP)j together with a measurable state space (M, f 8 ), 2) two measurable functions 
/o, /i : K —)■ M and 3) a probability density G on (K x K, IB 0 IB) with a null first order moment. 
In this setting we have the following 

Definition 1. A NBAR process is a family (Ai„)„gT of random variables with value in (M, IB) such 
that, for every u S T, Xu is X\u\-'oneasurahle and 

^uO — foi^u') 4“ ^uO and Xui — fi{^u^ A ^ui 

where ((Emo, eiii))„g.j. is a sequence of independent bivariate random variables with common den¬ 
sity G. 

The distribution of (Al„)„gT is thus entirely determined by the antoregressive functions (/o, /i), 
the noise density G and an initial distribution for Ai 0 . Informally, we view T as a population of 
individuals, cells or particles whose evolution is governed by the following dynamics: to each u S T 
we associate a trait Xu (its size, lifetime, growth rate, DNA content and so on) with value in M. 
At its time of death, the particle u gives rise to two children uO and u\. Gonditional on Xu = x, 
the trait {Xuo,Xui) € of the offspring of u is a perturbed version of [fo{x), fi{x)). 

The strength of this model lies in the fact that there is no constraint on the form of /o and /i, 
whose form is free. This is particularly interesting in view of applications for which no a priori 
knowledge is available on the autoregressive links. 

When X$ is distributed according to a measure p,{dx) on (M, IB), we denote by the law of 
the NBAR process (X„)„gT and by E^[-] the expectation with respect to the probability P^. 

1.2. Nonparametric estimators of the autoregressive functions. 

Estimation of the autoregressive functions. Our aim is to estimate the unknown autoregressive 
functions /o and fi in Definition 1 from the observation of a subpopulation. For that purpose, we 
propose to make use of a Nadaraya-Watson type estimator (introduced independently by Nadaraya 
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[36] and Watson [44] in 1964). The Nadaraya-Watson estimator is a kernel estimator of the regres¬ 
sion function E[F|X = x\ when observing n pairs (Xi, Yi),..., (X„, Y„) of independent random 
variables distributed as {X,Y). The Nadaraya-Watson estimator was also used in the framework 
of autoregressive time series in order to reconstruct ¥.\Xn\Xn-\ = a:], assuming that {Xn)n>o is 
stationary, see [39, 30]. We generalize here the use of the Nadaraya-Watson estimator to the case 
of an autoregressive process indexed by a binary tree i.e. a NEAR process. 

For n > 0, introduce the genealogical tree up to the (n -|- l)-th generation, T„+i = 

Assume we observe = (A„)ugt„+i, *-e. we have |T„+i| = — l random variables with value 

in K. Let I? C M be a compact interval. We propose to estimate {fo{x),fi{x)) the autoregressive 
functions at point x G T) from the observations X"+^ by 

|T„|-i ^ KhAx-Xu)Xu, 
uG' 

-uGT 

where vun > 0 and we set Kh^{-) = for > 0 and a kernel function AT : K —>■ K such 

that /jjjAT = 1. 

Main theoretical results and outline. Our first objective in this work is to study the estimators 
(1) both from nonasymptotic and asymptotic points of view. To our best knowledge, there is no 
extensive nonparametric study for nonlinear bifurcating autoregressive processes. We can mention 
the applications of Bitseki Penda, Escobar-Bach and Guillin [12] (section 4) where deviations 
inequalities are derived for Nadaraya-Watson type estimators of the autoregressive functions. We 
also refer to Bitseki Penda, Hoffmann and Olivier [13] where some characteristics of a NEAR 
process are estimated nonparametrically (the invariant measure, the mean-transition and the T- 
transition). 

Our nonasymptotic study includes the control of the quadratic loss in a minimax sense (The¬ 
orems 5 and 6) and our asymptotic study includes almost sure convergence (Proposition 7) and 
asymptotic normality (Theorems 8 and 9). To this end, we shall make use of nonasymptotic be¬ 
haviour for bifurcating Markov chains (see [27, 11]) and asymptotic behaviour of martingales. We 
are also interested in comparing the two autoregressive functions /o and /i and to test whether 
the phenomenon studied is symmetric or not. The test we build (in Theorem 11) to do so relies 
on our asymptotic results (see Corollary 10). 

The present work is organised as follows. The results are obtained under the assumption of 
geometric ergodicity of the so-called tagged-branch Markov chain we define in Section 2, together 
with the nonasymptotic results. In Section 3, we state asymptotic results for our estimators which 
enable us to address the question of asymmetry and build a test to compare /o and fi. We also 
give numerical results to illustrate the estimation and test strategies on both simulated and real 
data (Section 4). Section 5 encloses a discussion. The last parts of the article, Section 6 with an 
appendix in Section 7, are devoted to the proofs of our results. 

2. Nonasymptotic behaviour 

2.1. Tagged-branch chain. In this section we build a chain (Vm)m>o corresponding to a lineage 
taken randomly (uniformly at each branching event) in the population, a key object in our study. 
Let (im)m>i be a sequence of independent and identically distributed random variables that have 


Kh„ix - Xu)) V ZUr, 


TG{0,1}), 
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the Bernoulli distribution of parameter 1/2. Introduce 

(2) Go{-)= [ G{-,y)dy and Gi{-) = ( G{x, ■)dx 

JM. J'R 

the marginals of the noise density G and let (e/j)m>i be a sequence of independent random variables 
such that e'^ given im = i- has density for l £ {0,1}. In addition and are independent 
of (hfc)o<fc<m-i- We now set 


r Yo=X^, 

\ Yjn = fL^{Ym-i) + , m>l. 


Then the so-called tagged-branch chain Y = (Tm)m>o has transition 

(4) Q{x,dy) = ^(Go{y - fo{x)) + Gi{y - fi{x))'jdy, 

which means that for all to > 1, we have P(Tm & dy\YQ = x) = dy) where we set 


Q"^{x,dy)= [ Q{x,dz)Q'^ ^iz,dy) with Q°{x,dy) = 5x{dy) 

JzGS 

for the TO-th iteration of Q. 

Asymptotic and nonasymptotic studies have shown that the limit law of the Markov chain Y 
plays an important role, we refer to Bitseki Penda, Djellout and Guillin [11] and references therein 
for more details (in the general setting of bifurcating Markov chains). In the present work, the 
tagged-branch Markov chain will play a crucial role in the analysis of the autoregressive functions 
estimators^. 


2.2. Model contraints. The autoregressive functions fo and /i are devoted to belong to the 
following class. For 7 £ (0,1) and £ > 0, we introduce the class 7^(7, £) of continuous functions 
/ : K —)■ M such that 

|/(x)| < -i\x\+^ 


for any x £ K. 


The two marginals Gq and Gi of the noise density are 
class. For r > 0 and A > 2, we introduce the class G{r,X) 
g : M —>■ [0, 00) such that 


for any x £ K. When (Go,Gi) £ ^(r, A)^ for some A > 3, 
(eojEi), called noise covariance matrix, by 


devoted to belong to the following 
of nonnegative continuous functions 


we denote the covariance matrix of 


(5) 


/ ag pcroCriA 
^^pCToCTl CTi J 


with (To , Cl > 0 and g £ (—1,1). 


It is crucial for our proofs to study the ergodicity of the tagged-branch Markov chain Y. Geo¬ 
metric ergodicity of nonlinear autoregressive processes has been studied in Bhattacharya and Lee 
[8] (Theorem 1) and also in An and Huang [I] and Cline [15]. The main difference is that we need 


^More precisely, we will see that the denominator of (1) converges almost surely to the invariant density of the 
Markov chain V (Proposition 16). 
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here a precise control on the ergodicity rate, which should be smaller that 1 / 2 , due to the binary 
tree structure. We also see, through (3), that the autoregressive function is random in our case. 

The following crucial assumption will guarantee geometric ergodicity of the tagged-branch 
Markov chain V with an exponential decay rate smaller than 1/2 (see Lemma 13). For any 
M > 0, we set 

( 6 ) i5(M)=min| inf Go(x); inf G'i(a;)|. 

L |a;|<M \x\<M J 

Assumption 2. Set Mq = £-|-E[|e(|] < oo where e'l has density (Go -l-Gi)/2. There exists rj > 0 
such that 

and there exists Mi > 2 Mo/(l /2 — 17 — 7 ) such that 

(7) 2Mi5((1+7)Mi+£) > ^. 


The following assumption will guarantee that the invariant density u is positive on some nonempty 
interval (see Lemma 17). For any M > 0, we set 

|Go|oo + |Gi|oo f f 


r){M) = 


where, for a function h :. 


J\y\>M Jx&M l-|-|j/ — 7 |a:| — A|j/-|- 7 |a;|-|-f| 

M, \h\ao stands for sup,j.gR l^(a^)|- 


-dxdy, 


Assumption 3. For M 2 > 0 such that r]{M 2 ) < 1, there exists M 3 > i+'yM 2 such that 6 {M 3 ) > 0. 


2.3. Main results. We need the following property on K: 


Assumption 4. The kernel K : M. ^ M. is bounded with compact support and for some integer 
no > 1, we have x^K(x)dx = l{fc=o} for fc = 1, ..., no- 

Assumption 4 will enable us to have nice approximation results over smooth functions /o and 
/i, described in the following way: for A C M and /3 > 0, with fi = \_j3\ {/?}, 0 < {/3} < 1 and 

an integer, let denote the Holder space of functions h : A —>■ K possessing a derivative of 
order [/3J that satisfies 

(8) (y) — (x)| < c{h)\x — y\^^^. 

The minimal constant c{h) such that ( 8 ) holds defines a semi-norm \g\^i3 . We equip the space T-L^ 
with the norm = sup^. |/i(a;)| -|- \h\^/i and the balls 

■H^(L) = {h:X-^R, <L}, L> 0. 

The notation oc means proportional to, up to some positive constant independent of n and the 
notation < means up to some constant independent of n. 

Theorem 5 (Upper rate of convergence). Let 76 (0,1/2) and f > 0, let r > 0 and A > 3. Specify 
(/o.nj/i.n) with a kernel K satisfying Assumption 4 for some hq > 0, with 

oc |T„|-i/( 2 Mi) 

and vJn > 0 such that zun —>-0 as n —>■ 00 . For every L, L' > 0 and 0 < /3 < no, for every G such 
that {Go, Gi) G (^(r, A)n’Hg(L')) satisfy Assumptions 2 and 3, there exists d = d{'j,i, Go, Gi) >0 
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such that for every compact interval V C [—d, d] with nonempty interior and for every x in the 
interior ofD, 


sup {k,n{x) - fo{x)Y + - fl{x)y 


ifoJi) 


-, 1/2 


< 


'|T, 


|-/3/(2/3+l) 


where the supremum is taken among all functions (/o,/i) S ( 7 ^( 7 , £) n for any initial 

probability measure p{dx) on K for Xq, such that /t((l + | • |)^) < 00 . 


Some comments are in order: 


1 ) The noise density G should be at least as the autoregressive functions /o and fi, in order to 
obtain the rate Recall now that we build Nadaraya-Watson type estimators: the 

estimator f^^n is the quotient of an estimator of (v/^) and an estimator of v. Thus the rate of 
convergence depends not only on the regularity of /t but also on the regularity of 12 . Note that 
G determines the regularity of the mean transition y ^ Q{x,y) (see (4)) and thus the regular¬ 
ity of the invariant measure v. A more general result would establish the rate |T„|“^ 
with /?' the minimal regularity between and v (and thus between and G the noise den¬ 
sity). 2) The autoregressive functions /o et fi should be locally smooth, on I? a vicinity of x 
(and not necessarily globally smooth, on K). Note that we could also state an upper bound for 
E^[Jj,{fo,nix)-fo{x)) +{fl,n{ x) — fi{x)) dx]. 3) Up to the factor we obtain the classical 
rate where |T„| is the number of observed couples {Xu,Xui). We know it is optimal 

in a minimax sense in a density estimation framework and we can infer this is optimal in our 
framework too. To prove it is the purpose of Theorem 6 which follows. 4) We do not achieve 
adaptivity in the smoothness of the autoregressive functions since our choice of bandwidth /i„ still 
depends on /?. For classical autoregressive models (i.e. nonbifurcating), adaptivity is achieved 
in a general framework in the early work by Hoffmann [31], and we also refer to Delouille and 
van Sachs [18]. 5) For the sake of simplicity, we have picked a common bandwidth to define 
the two estimators, but one can immediately generalize our study for two different bandwidths 
(/i^„ = i G {0,1}) where k is the Holder smoothness of f^. 


We complete the previous theorem with 


Theorem 6 (Lower rate of convergence). Assume the noise density G is a bivariate Gaussian 
density. Let I? C K fee a compact interval. For every 7 G (0,1) and every positive i,/3,L, there 
exists G > 0 such that, for every x GT), 


liminf inf sup P( (|/o,„(a;) - /o(a:)| -b \fi,n{x) - fi{x)\) > C) > 0 , 

(/o.„./i.„) (/o./i) ^ ^ 


where the supremum is taken among all functions (/o,/i) G (J'i'lk) and the infimum 

is taken among all estimators based on (Xu,u G T„_|_i). 


This result obviously implies a lower rate of convergence for the mean quadratic loss at point x. 
We see that in a regular case, the Gaussian case, the lower and upper rates match. 


3. Asymmetry test 

Testing in the context of nonparametric regression is a crucial point, especially in applied con¬ 
texts. The question of no effect in nonparametric regression is early addressed in Eubank and 
LaRiccia [26] . We may also want to compare two regression curves nonparametrically and we refer 
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to Munk [35] and references therein. Specific tools have been developed to compare time series 
(see for instance the recent work by Jin [32] among many others). The test we propose to study 
asymmetry in a NEAR process is based on the following asymptotic study. 


3.1. Preliminaries: asymptotic behaviour. The almost-sure convergence of the autoregressive 
functions estimators is obtained in Proposition 7 for any oc |T„|““ with a S (0,1). Choosing 
a > l/(2/3 + 1), the estimator {fo,n{x), fi,n{x)) recentered by {fo{x), fi{x)) and normalised by 
converges in distribution to a Gaussian law. Depending on a > l/(2/3 + 1) or not, the 
limit Gaussian law is centered or not, as we state in Theorems 8 and 9. 


Proposition 7 (Almost sure convergence). In the same setting as in Theorem 5 with hn oc |T„| “ 
fora e (0,1), 


/ 7o.n(a:)\ f fo{x)\ 

\fi,n{x)) \Mx)J 


- a.s. 


as n ^ + 00 . 


From now on we need to reinforce the assumption on the noise sequence: we require that the 
noise (eo,ei) has finite moment of order 4, £[^0 + ef] < oo, which is guaranteed by G G G{r,X) 
for A > 5. We use the notation \K\l = J^K{x)‘^dx. 


Theorem 8 (Asymptotic normality). In the same setting as in Theorem 5 with A > 5 and hn oc 
|T„|-“/or o G (l/(2/3 + l),l), 




fo,n{x) - fo{x) 

Jl,n{x) - flix) 


■ A/2(02, S2(x)) 


with S 2 (a:) = \K\ 2 {v{x)) ^T, 


r being the noise covariance matrix and O 2 = (0, 0). Moreover, for xi,... ,Xk distinct points in T>, 
the sequence 


f f hn{Xi) - fo{xi)\ 

\\/i,n(a;;) - fi{xi)J 



is asymptotically independent. 


The restriction a > l/(2/3 + 1) in Theorem 8 prevents us from choosing hn oc 
which is the optimal choice to achieve the minimax rate as we have seen in Theorem 5. The 
following Theorem remedies to this flaw, but at the cost of an unknown bias. We obtain an explicit 
expression of this bias for /? an integer which depends on the /3-th derivatives of the autoregressive 
functions and the invariant measure of the tagged-branch chain. 


Theorem 9 (Asymptotic normality with bias expression). In the same setting as in Theorem 5 
with A > 5 and /3 an integer, 

(i) ifhZVW^ —>■ K with K G [0, 00 ) as n ^ -|-oo, then 




^Ax) - fo{x) 
AiAx) - fi{x) 


■ A /2 («:m 2 (a;), S 2 (a:)) with S 2 (a:) = |Ar| 2 (u(x)) ^T 


m 2 (a;) 


(- 1 )^ 
/3! i'{x) 


[ y^K{y)dy 

JR 


({vfofix) - A{x)fo{x)\ 
[JiyfAlx) - A{x)fi{x)) ■ 


and 
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(ii) If h^^y\Tn\hn —>■ +00 as n ^ +oo, then 




(- fo{x)\ 


m 2 {x). 


3.2. Construction of an asymmetry test. Let xi,... ,Xk be distinct points in K. We are going 
to build a statistical test that allows us to segregate between hypothesis 

Uo'.yi G {1,.. .,k},fo{xi) = fiixi) vs. Hi :3l e {1,... ,k}, fo{xi) ^ fi{xi). 

In the parametric studies on E. coli [27, 17], these tests are known as detection of cellular aging 
and they permit to decide if the cell division is symmetric or asymmetric. 


Construction of an asymptotically unbiased estimator. Inspired by Bierens [9] we define new esti¬ 
mators in order to both achieve the rate in the asymptotic normality property and 

remove the asymptotic bias. Let {j^fn (x), l G {0,1}) be the estimators (1) with bandwidth oc 
and [f}^n(x),L G {0,1}) be the estimators (1) with bandwidth hn^ oc 
for some 6 G (0,1). We define 

(9) (L„(x) = (l-|T„|"^)-'(i^“J(x)-|T„|"^^,t^(x)),.G{0^ 

Corollary 10. In the same setting as in Theorem 9, 

|T„|^ Aa4(02,S2(x)) with S2(x) = |iG|i(Kx))-'r, 

for every S G (0,1) in the definition (9) of (/o,ra(x),/i,„(x)). 

As announced the trick of Bierens [9] enables us to remove the unknown asymptotic bias while 
keeping the optimal rate of convergence. 


Test statistics. We define a test statistics based on the new estimators (/o,n,/i,n) by 

IX IsHt _ _ 2 

(10) Wn{xi,...,Xk) = .2 2 "o -CT^X^t^n(xO(/o.n(x;)-/l,„(xO) 

Wo + ^1 “ ^fXo(7ig)\K 12 

with 9„(-) = - ■) where oc |T„|-i/(2/5+i). 

Theorem 11 (Wald test for asymmetry). In the same setting as in Theorem 9, let Xi,...,Xk 
be distinct points in V. Then the test statistic Wnixi,... ,Xk) converges in distribution to the 
ehi-squared distribution with k degrees of freedom x^(fc), under Ho, and f^^-almost surely to +c», 
under Hi. 


Note that in (10) we could replace <7 q, a\ and q by 




uGTn 


I ^ ) ^uO^ul 

uGT„ 


with the empirical residuals = X^i — fi.niXu) for u G T„. We claim that these estimators are 
consistent, so that Theorem 11 is still valid for this new test statistics. Proving the convergence in 
probability of these three quantities would imply some technical calculations and we do not give 
here more details. 
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4. Numerical implementation 


4.1. Simulated data. The goal of this section is to illustrate the theoretical results of the previous 
sections, in particular the results of Theorem 5 (Upper rate of convergence) and Theorem 11 (Wald 
test for asymmetry). 


Quality of the estimation procedure. We pick trial autoregressive functions defined analytically by 
/o(ai) = ai(l/4 + exp(—a;^)/2) and /i(a;) = a;(l/8 + exp(—a::^)/2) 

for x S K. We take a Gaussian noise with o-g = crj = 1 and g = 0.3. We simulate M = 500 
times a NEAR process up to generation n + 1 = 15, with root X 0 = 1. We take a Gaussian kernel 
K{x) = (27r)“^/^ exp(—x^/2) and in order to implement estimators given by (1). 

We evaluate /o,„ and /i,„ on a regular grid oiV = [—3,3] with mesh Aa; = We did not 

meet any problem with the denominator in practice and actually set Wn = 0. For each sample we 
compute the empirical error 


e]*) = 


WM-Ma. 


u 


Ax 


i = 1 ,... ,M, 


where jj • Has: denotes the discrete norm over the numerical sampling. Table 1 displays the 
mean-empirical errors together with the empirical standard deviations, = M~^ Sti 

(M“^ ~ for L G {0,1}. The larger n, the better the reconstruction of /g and fi 

as shown in Table 1. 


n 

8 

9 

10 

11 

12 

13 

14 

[Trxj 

511 

1 023 

2 047 

4 095 

8 191 

16 383 

32 767 

eo 

0.4442 

0.3417 

0.2633 

0.2006 

0.1517 

0.1285 

0.0891 

sd. 

0.1509 

0.1063 

0.0761 

0.0558 

0.0387 

0.0295 

0.0209 

ei 

0.6696 

0.5141 

0.4006 

0.3027 

0.2356 

0.1776 

0.1384 

sd. 

0.2482 

0.1626 

0.1227 

0.0831 

0.0622 

0.0440 

0.0326 


Table 1. [Simulated data] Mean empirical relative error eg (resp. ei) and its 
standard deviation computed over M = 500 Monte-Carlo trees, with respect to 
|T„j, for the autoregressive function /g (resp. f\) reconstructed over the interval 
V = [—3,3] by the estimator fo^n (resp. fi,n)- 


This is also true at a visual level, as shown on Figure 1 where 95%-level confidence bands 
are built so that for each point x, the lower and upper bounds include 95% of the estimators 
ifo%{x),i = 1... M). As one can see on Figure 1, the reconstruction is good around 0 and deteri¬ 
orates for large or small x. Indeed the invariant measure estimator shows that its mass is located 
around 0 and thus more observations lie in this zone, which ensures a better reconstruction there. 
The same analysis holds for the reconstruction of /i, see the thin blue lines. 

The error is close to for both /g.n and /i,„ as expected: indeed, for a kernel of order 

no, the bias term in density estimation is of order h^C'n-o+i)^ Pqj- smooth /g, fi and u we have 
here, we rather expect for the rate for the Gaussian kernel with 

ng = 1 that we use here, and this is consistent with what we observe on Figure 2. 
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Figure 1. [Simulated data] Reconstruction of fo over 7)= [—3,3] with 95%-level 
confidence bands constructed over M = 500 Monte Carlo trees. In hold red line: 
X ^ foix). In thin blue lines: reconstruction of f\ with 95%-level confidence 
bands. Left: n = 10 generations. Right: n = 14 generations. 



Number of observations IT I (log-scale) 


Figure 2. [Simulated data] The log-average relative empirical error over M = 500 
Monte Carlo trees vs. log(jT„j) for /g (resp. fi) reconstructed over V = [—3,3] 
by fo^n (solid blue line) (resp. /i_„ (dashed blue line)) compared to the expected 
log-rate (solid red line). 


Implementation of the asymmetry test. We implement now the estimators (9) inspired by [9] in 
order to compute our test statistics (10). We keep a Gaussian kernel and we pick h^n^ = 
and hn'^ = {i.e. 6 = 1/2 - and the choice of <5 proves to have no influence on our 

numerical results). The numerical study of fo^n and fi^n leads to similar results as those of the 
previous study. For a given grid {xi ,..., aife} of I? = [—3,3], we reject the null hypothesis TLq if 
Wn{xi,... ,Xk) exceeds the 5%-quantile of the chi-squared distribution with k degrees of freedom 
and thus obtain a test with asymptotic level 5%. We measure the quality of our test procedure 
computing the proportion of rejections of the null. 
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We first implement the two following cases: 

(Case =) fo{x) = fi{x) = a;(l/4 + exp(-a:^)/2), 

(Case /o(a;) = a;(l/4 + exp(—a;^)/2) and /i(a;) = a:(l/8 + exp(—a;^)/2). 

The test should reject Ho in the second case but not in the first one. The larger n, the better the 
test as one can see in Table 2: Ho is more and more often rejected for (Case and less and 
less often rejected for (Case —) as n increases, which is what we expect. We also observe the 
influence of the number of points of the grid which enables us to build the test statistics. Three 
grids of I? = [—3, 3] are tested with k = 13, 25 and 61 points. The larger the number of points, the 
larger the proportion of rejections of Ho in both cases. However, for (Case =), more that n = 14 
generations are needed to reach the theoretical asymptotic level of 5%. The choice of the number 
k of points is delicate but we would recommend to use a low k to build the test. 


n 



8 

9 

10 

11 

12 

13 

14 

|T„| 



511 

1 023 

2 047 

4 095 

8 191 

16 383 

32 767 


Ax = 

0.5 

46.8% 

67.2% 

87.6% 

99.0% 

100 % 

100 % 

100 % 

Case 

Ax = 

0.25 

59.6% 

77.8% 

92.8% 

99.8% 

100 % 

100 % 

100 % 


Ax = 

0.1 

67.8% 

85.4% 

95.6% 

99.8% 

100 % 

100 % 

100 % 


Ax = 

0.5 

19.6% 

18.6% 

18.2% 

16.2% 

13.4% 

14.8% 

12.4% 

Case = 

Ax = 

0.25 

30.4% 

30.0% 

29.0% 

24.8% 

21.4% 

19.4% 

19.8% 


Ax = 

0.1 

42.6% 

42.6% 

40.4% 

39.8% 

35% 

31.6% 

32.2% 


Table 2. [Simulated data] Proportions of rejections of the null hypothesis Ho '■ 
{yi = 1,... ,k, fo(xi) = fi{xi)} for 5% asymptotic level tests over M = 500 
Monte-Carlo trees. The test is based on the test statistics Wn{xi,... ,Xk) (10) 
with the grids {xi = —3 (I — l)Aa: < 3; / > 1} for Ax € {0.5; 0.25; 0.1} fi.e. 
k = 13,25 and 61 points). (Case the proportions (power of the test) should 
be high. (Case =): the proportions (type I error) should be low. 


The second numerical test aims at studying empirically the power of our test. We keep with 
the same autoregressive function fo for cells of type 0 and parametrize the autoregressive function 
for cells of type 1 such that it progressively comes closer to fo- 

fo{x) = x{l/Aexp(—x‘^)/2) and fi^rix) = x{t exp{—x‘^)/2) 

for r G [1/8,1/4]. This choice enables us to interpolate between (Case and (Case =). 

As r becomes closer to 1/4, i.e. as fi,r becomes closer to fo, we see the decrease of the 
proportions of rejections of the null on Figure 3. The steeper the decrease is, the better performs 
our test. The proportion of rejections of Hq is higher than 40% only for r up to 0.1875 for a 
reasonable number of observations (lT„j = 2 047 on the left in Figure 3). On the right in Figure 3, 
one can see what become the results for a larger number of observations, jT„j = 32 767: the 
performance is good for t up to 0.225, i.e. closer to the equality case r = 1/4. 

4.2. Real data. Quoting Stewart et al. [40], ’’The bacterium E. coli grows in the form of a rod, 
which reproduces by dividing in the middle. (...) one of the ends of each cell has just been created 
during division (termed the new pole), and one is pre-existing from a previous division (termed the 
old pole).” At each division, the cell inheriting the old pole of the progenitor cell is of type 1, say, 
while the cell inheriting the new pole is of type 0. The individual feature we focus on is the growth 
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100 


90 



10 |- _I_I_I_I_I_I_I_ , 

0.125 0.1375 0.15 0.1625 0.175 0.1875 0.2 0.2125 0.225 0.2375 0.25 

Similarity parameter between fO and f1 



Similarity parameter between fO and f1 


Figure 3. [Simulated data] Proportions of rejections of the null hypothesis 
Ho (power of the test): {VI S {1,... ,k}, fo{xi) = fi^rixi)} with respect to 
T G [1/8; 1/4] for 5% asymptotic level tests over M = 500 Monte-Carlo trees. 
The test is based on the test statistics Wn{xi,... ,Xk) (10) with the grid {xi = 
—3 {I — l)Aa: < 3;1 > 1} for Ax = 0.5 (i.e. fc = 13 points). Left: n = 10 
generations. Right: n = 14 generations. 


rate {E. coli is an exponential growing cell). Stewart et al. [40] followed the growth of a large 
number of micro-colonies of E. coli, measuring the growth rate of each cell up to 9 generations 
(possibly with some missing data). 

Recently, concerning the data set of Stewart et al., Delyon et al. [19] found out that ’’There 
is no stationarity of the growth rate across generations. This means that the initial stress of the 
experiment has not the time to vanish during only the first 9 generations.” As a consequence, 
we should not aggregate the data from the different micro-colonies and we shall only work on 
the largest samples. The largest genealogical tree counts 655 cells for which we know both the 
type and the growth rate, together with the type and the growth rate of the progenitor. The 
autoregressive functions estimators {fo,n, fi,n) are represented on Figure 4(a)^. It is surprising that 
the relationship between the growth rates of the mother and its offspring may be decreasing. In this 
case, our nonparametric estimated curves are close to the linear estimated curves (computed using 
the estimators of Guyon [27]). We show a second example on Figure 4(b) where the relationship 
may not be linear. 

Previously, Guyon et al. [28] and de Saporta et al. [21, 23] carried out asymmetry tests, and 
our conclusions seem to coincide with the previous ones. We implement our test statistic (10) for 
the largest tree, using 10 equidistant points of V = [0.0326; 0.0407] - where 80% of the growth 
rates lie - using the covariance matrix estimator (11). For the largest tree of 655 cells, our test 
strongly reject the null hypothesis (p-value < 10“^). In the same way, the null hypothesis is 
strongly rejected for the 10 largest trees available (from 443 cells to 655). Thus, we may conclude 
to asymmetry in the transmission of the growth rate from one cell to its offspring. Admittedly, 
our test does not take into account the influence of missing data and the level of our test for small 


^We keep on with a Gaussian kernel, the bandwidths are picked proportional to and g ^ _ 1 / 2 ) 

with N = 655, up to a constant fixed using the rule of Silverman. We underline we do not observe the full tree 
Tg, and we compute our estimators accordingly. Point-wise confidence intervals of asymptotic level 95% built using 
Corollary 10 overlap the curves x /o.nl^;) and x /i „(x), and are far too optimistic (since n < 9). 
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trees is poor (recall Table 2, (Case =), Column n = 8). Thus one should take this conclusion with 
caution. 



(a) N = 655 



(b) N = 446 


Figure 4. [Real data from Stewart et al. [40]] Points (X„,X„o) and {Xu,Xui) 
for N cells u S Tg. Bold green curve (resp. Thin green line): reconstruction of 
X fo[x) with X ^ fo,n{x) (resp. with a linear estimator). Bold red curve (resp. 
Thin red line): reconstruction of xfii^) with x fi,n{x) (resp. with a linear 
estimator). 


5. Discussion 


Recursive estimators. We could estimate {fo{x), fi{x)) the autoregressive functions at point x GV 
from the observations {Xu)ueT„+i by 

n 

E E Kh^(.x - X^)X^, 

ilAx) = -,, e { 0 , 1 }) 

E E Kh^{x-Xu) 

m—0u^Gm 

with the collection of bandwidths (hm = |<ljm|~“)o<m<n for a S (0,1). These estimators can be 
seen as a version of recursive Nadaraya-Watson estimators when the index set has a binary tree 
structure. We stress that our results also hold for this alternative procedure. 


Heteroscedasticity. Given two functions ao,ai : M —?► [0, oo), we could consider the generalized 
autoregressive equations 


— foi^u) + <xo{Xy_)euo and —/i(X„) + (Ti(X„)e„i 

with IE[e^o] = = 1 and E[£„oeui] = Q where g S (—1,1). Assuming 0 < infogg crt(a;) < 

saPajgR CTj(x) < oo for i G {0,1}, Theorems 8 and 9 still hold with 

cr^(x) gaoix)ai{x)'' 

^gao{x)ai{x) (jI{x) 

The estimation of the variance functions (Tq and CTi would be interesting but the theoretical study 
of such estimators lies here beyond the scope of this work. 


i:2{x) = \k\1{v{x)Y 







14 


S. VALERE BITSEKI PENDA AND ADELAIDE OLIVIER 


Uniform test. The asymmetry test we have built is based on the choice of a grid of points on K. 
A theoretical result is needed in order to build a uniform test on a interval V cM.. More precisely, 
to achieve such a uniform test we should study the asymptotic behaviour of 

sup |/i,„(ai) -/t(x)|, iG{0,1}. 

x^V 

This asymptotic study lies in the scope of the theory of extrema. One can see the study of Liu and 
Wu [33] for autoregressive processes of order 1: an asymptotic Gumbel behaviour is highlighted for 
the Nadaraya-Watson type estimator of the autoregressive function. Alternatively, studying the 
limit distribution of 

[ rG{0,1}, 

Jv 

we could derive an other criterion to discriminate between /o = /i and /o fi- 

Moderate deviations principle. The work of Bitseki Penda et al. [13] brings deviations inequalities 
which enable us to derive a moderate deviations principle for the estimators { fo,n{x), fi,n{x)) . The 
results of [13] are valid under a uniform ergodicity assumption for the tagged-branch chain, which 
can be achieved restricting ourselves to the class ^"(7 = 0,£). 


6. Proofs 


The notation < means up to some constant independent of n and uniform on the class (/g, /i) G 

For a IB-measurable function 5 : M —?> M and a measure p on (M, 05) we define p{g) = 
J^g(x)p{dx). For AT C K let 

I 5 I 1 = / \9{y)\dy, \g\l= / giyfdy, Isk = sup |5(?/)| 

Js. JR yeK 

and Igloo = IgjiR- For a function g : —)■ M and K,K' CM. let 

\g\K,K' = sup |g(a;,a;')|. 

{x,x')gKx K' 

The following lemma is well-known in the general setting of bifurcating Markov chains including 
our NEAR model (see Delmas and Marsalle [17], Lemma 2.1 and Guyon [27], Equation (7)) and 
highlights the key role of the tagged-branch Markov chain. We prove it in Appendix for the sake 
of completeness. Introduce 

(12) V{x,dydz) = G{y - fo{x), z - fi{x))dydz, 

Markov kernel from (ffi, 05) to (M x K, IB G 05). 


Lemma 12 (Many-to-one formulae). Let (X„)„gT be « NEAR process, with any initial probability 
measure p{dx) on (K, 05) for Xq, such that /r((l -I- | • |)^) < 00. Then for g : K —>■ K such that 
[5(2:)I < 1 + |a;| for any x G ffi, we have 


(13) 


1 ^ yi^u) 


|G™|E^[g(y„)] = |G™|/r(Q™g) 


with Q defined by (4) and 

9{Xu)g{Xy) 


[G„| ^ 2'-V(q™"'(ip(Q'"'<7 g Q'-^g))), 

1^1 


(14) 
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with P defined by (12). 

6.1. Preliminary. Set Y{x) = 1 + |a;| for x G M. It plays the role of the Lyapunov function in the 
following 

Lemma 13 (Ergodicity). Let 7 G (0,1/2) and £ > 0, let r > 0 and A > 2. For every (/o,/i) G 
F{'y,£)'^, for every G such that {Go,Gi) G Gir,X)^ satisfy Assumption 2, the Markov kernel Q 
admits a unique invariant probability measure v of the form v(dx) = v{x)dx on (M, 18). Moreover, 
for every G such that {Go,Gi) G ^(r, A)^ satisfy Assumption 2, there exist a constant R> 0 and 
p G (0,1/2) such that 

sup sup \Q"^g{x) — v{g)\ < RY{x) , x G IR, m > 0, 

(/o./i) lsl<v 

where the supremum is taken among all functions (/o,/i) G F{'j,£)'^ and among all functions 
g : M —)■ K which satisfy |5(x)| < V(x) for all x G K. 

Proof of Lemma 13. We shall rely on the results of Hairer and Mattingly [29]. 

Step 1. In order to make use of Theorem 1.2 of [29] we shall verify their Assumptions 1 and 2. 
Since Yi = /tj^(Yo) + where l\ is drawn according to the Bernoulli distribution with parameter 
1/2 and has density (Gq + Gi)/ 2, we get 

Q(| • |)(x) = E^IWI] < E[|A,(x)|] + E[|£;|] < 7|x| + Mq 

using (/o,/i) G with Mg = £ + E[|e/|] as defined previously. We have 7 G (0,1) and 

Mq > 0, so that is Assumption 1 in [29] (with their V{y) = |y|). 

Set C = {x G M; |x| < Mi} where Mi comes from Assumption 2. For any A G 18 and x G C, 
using the expression of Q given by (4), 

Q{x,A)>^ f Go{y - fo{x))dy[ Gi{y - fi{x))dy. 

^ J Ar\C ^ J Anc 

For (x, y) G C^, we have \y — /t(x)| < (1 + 7)Mi + £ for i G {0,1}. Thus 

inf Q(x,A)>2Mi5((1 + 7 )Mi+€)^^^ VA G S, 

where (5(-) is defined by (6) and |A| denotes the Lebesgue measure of A G 18. That is Assumption 2 
in [29] with a = 2Mi5((l+7)Mi+£) > 0. The existence and uniqueness of an invariant probability 
measure v follows from Theorem 1.2 of [29]. Moreover v is absolutely continuous with respect to 
the Lebesgue measure, since Q{x,dy) defined by (4) itself is absolutely continuous with respect to 
the Lebesgue measure. By Assumption 2, for Mi satisfying (7), there exists some ag G (0,1/2) 
such that 

(15) a = 2Mi(5((l + 7)Mi+£) > 1/2 + ao. 

We set P = ag/Mg. For all x G K we pick Q^ 6 x and u for pi and p 2 in Theorem 1.3 of [29] and 
apply it recursively. We conclude that for any function g such that |g(x)| < (l + /3|x|) for all x G K, 
for some positive constant G, we have 

\Qf^g{x)-v(g)\<Cp^{l + P\x\) 

with G = 1 + /g(l + P\x\)i'{x)dx < 00 . 
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Step 2. A precise control of p with respect to 7 , Mq and a is established in [29]. Set 70 = 
7 + 2 M 0 /M 1 + 7 ] G (7 + 2 M 0 /M 1 ,1) where 7 comes from Assumption 2. Theorem 1.3 of [29] states 
one can take 

Condition (15) gives immediately 1 — (a — ao) < 1/2. Note that 


2 + Mi/? 7 o ^ 1 

2 + M 1/3 ^ 2 


Ml > 


(1 + 2ao)Mo 


1 / 2 - 7 - 7 ? 

which is satisfied for ao € (0,1/2) with our choice of Mi (which satisfies (7)). Thus Assumption 2 
guarantees we can take p < 1 / 2 . 

It immediately follows from Step 1 and Step 2 that, for any function g such that jgl < V, for 
some positive constant i?, we have 

\Q"^9{x)-i^{g)\<Rp^Y{x) 

with V(x) = l+|ai|, as asserted. This bound holds uniformly over (/o, /i) S R{"y, ^)^ by construction 
(the uniform choice of p is guaranteed by Step 2 and for a uniform choice of C in Step 1 recall that 
(Go,Gi)ee(r,A)2). □ 

Step 2 of this proof highlights that Assumption 2 is written to readily obtain p < 1/2. Note 
that to prove the existence of some p G (0,1) in Step 1, one only need the existence of some 
Ml > 2Mo/(l — 7 ) such that 2Mi5((l + 7 )Mi + i) > 0 with ^(•) dehned by ( 6 ). 

6.2. Estimation of the density of the invariant measure. For x G V, set 

1 


(16) 


lix) = 


|T, 


^ Knjx-X^), 


mGT„ 


a kernel estimator of the density v of the invariant measure of the tagged-branch chain Y of 
transition Q. 

Proposition 14. Let 7 G (0,1/2) and £ > 0, let r > 0 and A > 3. Specify with a kernel K 
satisfying Assumption 4 for some ng > 0 and 

h„oc|T„|-Wi). 

For every L' > 0 and 0 < /3 < ng, for every G such that {Gg,Gi) G {G{r,X) C 'Hg(T'))^ satisfy 
Assumptions 2 and 3, for every compact interval I? C M with nonempty interior and every x in 
the interior ofD, 


sup 

(/o./i) 


{Vn{x)-v{x)) <|T„|23+i 


where the supremum is taken among all functions (/o,/i) G F{'j,£)‘^, for any initial probability 
measure p{dx) on K for Xg such that /r((l + \ ■ |)^) < 00 . 

Proof. The usual bias-variance decomposition can be written here as 


E 


(Vnix) - v{x)y 


= ® (|^ Kh„{x - Xu) - i^{x)y 

' «GT„ 

= E (-j^ ^ Kh„{x - Xu) - Kh^-kv{x)y + [Kh^-k v{x) - v{x)y 

' ttGT„ 













ESTIMATION IN NONLINEAR BIFURCATING AUTOREGRESSIVE MODELS 


17 


where a stands for the convolution. For (Gg, Gi) € have u S for some L” > 0: 

since u is invariant for Q, using (4) which defines Q, we can write 

viy)= f i'ix)Q{x,y)dx = ^ f u{x)(Go{y-fo{x))+Gi{y-fi{x)))dx, 

Jr ^ Jh ^ ^ 


where we immediately see that the regularity of u is inherited from the regularity of Gg and Gi the 
marginals of the noise density G. By a Taylor expansion up to order [/3J (recall that the number 
ng of vanishing moments of K in Assumption 4 satisfies ng > /3), we obtain 


(17) 

see for instance 

(18) 


(Kh^^vix) - iy(x)f < h'ff, 

Proposition 1.2 in Tsybakov [42]. In addition, we claim that 


E 




<(|T„|/i„) 


-1 


Choosing /i„ oc |T„| brings the announced result. Let us now prove (18) in two steps. 

Step 1. Result over one generation. We heavily rely on the following controls: 


Lemma 15. Let F be a bounded function with compact support and G be a locally bounded function. 
For hn > 0 and x in the interior ofV, we define the function Hn : K —> M by 

H„{-) = F{h-\x--))Gi-). 

For hn such that x — /i„supp(F) C V, we have 

(i) \QHn\oo < /inlP'lilGj-DlQIs.-D and \v{Hn)\ < /ln|-F|l|G|x)|u|-D. 

(ii) Under Assumption 2, for m > 1, y S M, 

|Q™i7„(y) - i^{Hn)\ <hnA (V(y)p™) 

up to the constant max{|F|i|G|x)(|Q|R,x) + |u|x)),i?|F|oo|G|x)}. 


The proof of this lemma is postponed to the Appendix. Note that we have jujx) < IQk < |Q|r.r < 
(|Gg|oo + |Gi|oo)/2 < oo (recall that u is invariant for Q defined by (4) and (Gg,Gi) G Q{r,X)‘^). 
Set 

Hni-) = K{h~^{x - ■)) and Hn{-) = Hn{-) - v{Hn), 
with hn sufficiently small such that x — /i„supp(Ar) C F. Pick m > 1. On the one hand, 


E^[ ^ HliXn) 


\<Gn.HQ^Hl) < \<G,m\hn 


relying on the many-to-one formula (13) and Lemma 15(i). Inspired by Doumic, Hoffmann, Krell 
and Robert [24] (proof of Proposition 8), set 1* = [] \oghn\/\ logpJJ. Since (V(Y ^Y))) < 

oo (use Y{x) = I + [a;] for x G K, (/g,/i) G and finally y(V^) < oo, one can look at 

Lemmae 25 and 26 of Guyon [27]), by the many-to-one formula (14), 


E„ 


E 

u^vGG„ 


Hn 


L 

{Xn)Hn{Xn)] < [G™ | ( ^ 2 '"-b 


1^1 


E 

Z=Z* + 1 


2/-1^2(Z-1) 1 < 


2 '^h 

^ '^n 1 


using the first upper-bound given by Lemma 15(ii) for before I*, the second one after I*, and using 
p G (0,1/2). Finally, we conclude that 


E„ 


( V Hn{Xn)f < \G,m\h 
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Thus, we have uniformly over (/o,/i) € -^(7:^)^) 


(19) 


E„ 


(|r-1 X! - Xu) - Kh^-kV^x))^ <i\G,m\hn) \ TO>1. 




Step 2. Result over a subtree. We rely on the previous inequality (19). Decomposing by generation 
and by the triangle inequality, we obtain 


E„ 


uGT„ 

< 


M KhAx - Xu) - Kh^-kv{x)y 

I u£l 


< 


m—0 

1—1 


This proves inequality (18) we claimed and the proof is now complete. Note that we have removed 
the log-term which appears in Proposition 8 of Doumic et al. [24] . □ 

Proposition 16. In the same setting as in Proposition 14, 


v„{x) -> v{x), 


— a.s. 


as n —T' oo. 
Proof. Write 


Vn{x) - I^{x) = (t^ ^ Kh„{x - Xu) - Kh„ ki^{x)) + {Kh^ kiy(x) - i^{x)). 

From (17), we deduce that 

(20) \Khu *J^{x) — v{x) \ —?► 0 as n —>■ oo. 

Note that we could obtain (20) invoking the result stated by Theorem 2.1.1 of Prakasa Rao [37], 
result also known as the Bochner Lemma (see section 7.1.2 of Duflo [25]). Using (18) for oc 
|E„1““ with a e (0,1), 


KhAx-Xu)-Kh„kvix)y 

n>0 ' uGT„ 


< OO, 


and by the Borel-Cantelli lemma, we deduce that 

i" 'y ' Xh„ {x Xu) Xfi^ -k v{x) 


mGT„ 


0 , Pu - a.s. 


as n ^ OO. Thus Ivnix) — v{x)\ —>■ 0, P^ — a.s. as n —>■ oo. 

6.3. Proof of Theorem 5. For x in the interior of V, for t G {0,1}, we plan to use the decom¬ 
position 

M,^„(x) + L,,n(x) (i^/yix) 


□ 


An(x) - Mx) = 


9„{x)\/Wn v{x) 

M,^n{x) L,^u{x) - {vfy{x) dn{x)y Wn- v{x) 


Vn{x)\/Wn Vn{x)\/ W„ 


i(x) V ' 


Mx) 
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with 




(21) 

^L,n{x) — . . 'y ( Kh„{ 

' mGT„ 

(22) 


Ar.n(x) - 1 1 ^ KhA 

' ' uGT„ 

Thus 




(23) E^ 

(7r.n(a:) - /,(x))^ 




■ E/i 




■ E,, 


{Vn{x) y Wn- u(ai))" 


using |/t|x> < oo, uniformly over the class J^(7,£) for V compact interval. We successively treat 
the three terms in Steps from 1 to 3. 

Step 1. Term For all m > 1 and t G {0,1} fixed, the sequence (suJueGm i® ^ family 

independent random variables such that E[£^J = erf. Thus 


E„ 


(|(G„| II ^u)£u.) 

' '«gg„ 


\hl^n\G 


1 


5] K^{h-\x-X^)) 


U^Gri 


-ti{Q^K^{h-\x-■)))< {\Gra\Ky 


\Gn.\hl‘ 

by the many-to-one formula (13) and using Lemma 15(i). The result over a subtree follows by the 
triangle inequality (as in Step 2 of the proof of Proposition 14), 


(24) 


E„ 


{M,,nix))^ <(|T„|h„) 


-1 


Step 2. Term L^^nix). By usual the bias-variance decomposition. 


E„ 


(L,,„(x) - (i//J(ai))^ =E^ (|^ ^ Kh„ix-Xu)MXu)-Kh„*{vfyix)y 


«gt„ 


+ {Khr, * (I'fyix) - (u/,)(x))^ 

First, since (u/J G H^(L") for some constant L" > 0 and since Assumption 4 is valid with uq > j3, 

{Kh„ * (u/J(x) - (u/,)(x))^ < hf. 

Secondly, we do the same study as in the proof of Proposition 14 for 

H^{-) = K(h-\x-■))!,{■), 

relying on Lemma 15 (using \fc\-D < oo uniformly over the class J^(7,^), with T) compact interval), 
with hn sufficiently small such that x — /i„supp(A') C T>. We obtain 


E,; 


E ^/r.(x-X„)A(A„)-iL,„*(u/J(x))"j <(|T„|h„)-b 


Thus 

(25) 


•uGT^ 


E„ 


(L,,„(x) - (u/,)(x))^ <hy^ + {\Tr,\K) ^ 


Step 3. Denominator (x) V Wn ■ We prove the following lemma in Appendix. 
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Lemma 17. Let 7 G (0,1/2) and £ > 0, let r > 0 and A > 3. For every G sueh that (Gq, Gi) G 
Q(r,X)^ satisfy Assumptions 2 and 3, there exists d = <^( 7 , £, Gq, Gi) > 0 sueh that for every 
V C [—d, d], 

inf inf i/tx) > 0 
(/oi/l) aiGD 

where the infimum is taken among all functions (/oi/i) & -^(7)^)^- 


Relying first on Lemma 17, we choose n large enough such that 


(26) 

and we get 
(27) 


0 < Wn < - inf inf 

2 ifo^fi) x£T> 


(9„(a:) y Wn - v{x)y 


<E„ 


ipnix) - v{x)Y +{\^n\K) \ 


uniformly over (/o,/i) G F{pj,£Y, using the upper-bound obtained in the proof of Proposition 14 
for the second inequality. 

Finally, gathering (24), (25) and (27) in (23) and choosing /i„ oc we obtain the 

asserted result. 


Remark 18. The threshold Wn should be chosen such that it inflates the upper-rate of convergence 
of a slow factor only. Typically, Wn = (lnn)“^ is suitable. Looking carefully at the proof, see (26), 
we actually see that Wn —t 0 as u —>■ 00 is not necessary. One could choose, Wn = w with 

w = - inf inf v(x) > 0 
2 (fo,fi)xev 

where the infimum is taken among all (/o,/i) G F{‘j,£Y and where w > 0 is guaranteed by 
Lemma 1 7. However, to calibrate in practice the threshold in such a way is not possible since we 
cannot compute w. 


will denote the law on IRl'’^"+d of the vector 
u G T„+i), NEAR process, in the sense of Definition 1, driven by the autoregressive functions 


6.4. Proof of Theorem 6 . In the following, 

{Xn 

fo and fi with initial probability measure fj,{dx) on K for A 0 and with a Gaussian noise i.e. 


G{x,y) = (27r(crgcr^(l-p2)) 


- 1/2 


a'ix^ - 2aoaipxy + a^y'^ 
2 {a^af{l - p^)) 


exp - 


ix,y) G 


with tTO) CTi > 0 and p G (—1,1). When /q = /i = /, we shorten P”^^ into P^. We denote by 
Ey [•] the expectation with respect to Pj. 


Step 1. Let d > 0. Fix f^ = fi = f* with f* G F{^,£) n TL^iL — d) and x £ V. Then, for large 
enough n, setting /i„ oc we construct a perturbation (/on,/in) of (/o,/i) defined 

by 


/o.n(y) = fiAy) = fniy) = f*iy) +ah^K{hA{x - y)), y G K, 
for some smooth kernel K with compact support such that K{0) = 1, and for some a = qs^k > 0 
chosen in such a way that /„ G F{'^,£) n TL^A). Note that at point y = x, \fo^n{x) — fo{x)\ = 

\fi,n{x) - fi{x)\ = as,Khl^ = a5,_R-|T„|“^/(2/3-i-i)^ 


Step 2. In the sequel, to shorten expressions, we set 
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For arbitrary estimators {fo,n{x), fi^„{x)) and a constant (7 > 0, the maximal risk is bounded 
below by 

;(p;;,(|T„|^/(2/5+i)|(/o^„,I > c) +p;;.(|t„|'5/(2/5+i)|(/o„j,^„)_/*| > c)) 

1{|T„F/(Wi)|(/o_„,/i_„)-/*|>C} + ^{|T„F/(Wi)|(/o_„./i,„)-/.|>C}_ “ ~ ' 

By the triangle inequality, we have 

-r\ + |(/0.n,7l.n) " /„* | ) > 2|T„ 7/(27+!) |/* ( 3 ,) - f*{x)\ = 2as,K 

by Step 1, so if we now take C < as.K/4:, one of the two indicators within the expectation above 
must be equal to one with full P^.-probability. In that case, 


4' 

>-£7. 
~2 f 


max P;^(|T„|7/(27 +i)|( 7„„J^^„)_/| >C) > i(l- 


/£{/+/*} 

and Theorem 6 is thus proved if limsup„_,,oo ||P/* 




Step 3. By the Pinsker inequality, we have ||Py, — Pj. Htu < 
likelihood ratio can be written 


^]4tv < 1 . 


In^ 

!!! dPy. j 


\ 1/2 

1 and the log- 


E/. 


In 


/dP7. 


dF], 


= 




mGT„ 


G{X^ 0 - r{X4,X^,- f*{X4) 

G(x^o - f*{X 4 ,Xui - /*(x„))yj 


= - y: e;. 

mGT„ 


(tr^ — aocrip)euo + (cq — crocrip)e„i 


irn-mxu) 


, CTq-I-(T i - 2croO’ip f*\2tv \ 

+ 2alal{l-p4 ^ ^ ^ 

U 1! 1 «gt„ 


since G is chosen to be the bivariate Gaussian density and, under P^,, we know X^o = f*{Xu)+Suo 
and Xui = f*{Xu) + eui- Recall now that Xu is independent of (e«o,e«i) which is centered. Thus 


S’) 


In 


(fh] 

vdP’j. y 


(28) 


O'o + gi - 2crocrip 

2cr2cr2(l-p2) 

^0+^1- 2o'ocrip 

2 aoV 2 (l-p 2 ) 


5 ] E], \ir-fj4Xu) 


U^Tri 


n 

Y. \Gm\p{Qf^i{r - j:?)) 


m—O 


using the many-to-one formula (13), with 

Qf4x,y) = ^(Go{y-f*{x)) + Gi{y-f*ix))y 
where Gq and Gi are the marginals of G. Since 

2 /*((r - f 4 f)iy) = alxhf [ K^{h-4x-z))Qf.{y,z)dz < aY\K\l\Qf.\R^Rhf+'^ 
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where |Q/*|b,r only depends on (7o,c7i and p. Gathering this last upper-bound together with (28) 
and the Pinsker inequality, we finally get, with our choice of /i„, 

and this term can be made arbitrarily small by picking aK,s small enough. 

6.5. Proof of Proposition 7. For the choice hn oc |T„|““ with a G (0,1), relying successively 
on (24) and (25), we deduce 

(M,^„(x))^ <oo and ^ E^ (L,^„(x) - (iyf,)(x))^'^ 

n>0 n>0 

Thus, as n —>■ oo, 

0, P^-a.s. and L,_„(a:)(v/J(a;), P^- 

From Proposition 16 and since ra7„ —>■ 0, 


< oo. 


9n(x) V Wn iy(x), 


— a.s. 


We conclude reminding that 


fi;n(x) = 


+ Lt,n(x) 

Vriix) V Wn 


6 .6. Proofs of Theorems 8 and 9. 


Proof of Theorem 8. Set x in the interior of T). The strategy is to use the following decomposition, 
which is slightly different from the one used in the proof of Theorem 5, 


vdiiK ft; - 

\fl,n{^) - h{x)j 


Iy'n{x)\/Zi7. 


V^|Tn|h. 

+ -x/lTTnlh. 


Mo,nix) 
Ml,nix) 


No,nix) 
Nl,nix) 


V\^n\hn 


Ro,n ix) 
h^l,nix) 


where, for l G {0,1}, M^,nix) is defined by (21), 
(29) 


N,,uix) = Kh„ix - Xn)if,iXn) - f,ix)), 




(30) 


R, 


,,nix) = (Vnix) - iVnix) V tI7„) ^(o;) , 


and Vnix) is defined by (16). The first part of the decomposition is called main term, the second 
part negligible term and the third part is a remainder term due to the truncation of the denominator 
of the estimators. The strategy is the following: prove first that the last two terms goes to zero 
almost surely and prove a central limit theorem for the main term in a second step. 

Step 1. Negligible and remainder terms, N,,,nix) and i?^„(a;). We use the decomposition lV^„(ai) = 
ivi.n ix) + ix) where 


(31) 


N!:!2i^) = ^ E - Xu)iMXn) - Mx ))], 

' UeTr^ 


(32) ArW(a:) = ^ E - X„)(A(X„) -/.(x)) - - X„)(/,(X„) -/,(ai)]). 
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We claim that 

^/\Tn\hr^ (x) —^ 0 and A/|T„|h„ (^) 0, - a.s. 

as n —> oo. 

Step 1.1. Set 

(33) H^{-) = K{h-\x - •)) (A(-) - A(t)) 

with hn sufficiently small such that x — hnSnpp{K) C V. By the many-to-one formula (13), after 
a decomposition of the subtree Tn in 

n 

(34) iVW(ai) = E - Y^){f,{Y^) - ^x))] = h-^y{Hr.) 

' "I m=0 

since u is the invariant measure of the tagged-branch chain {Ym)m>o and 
v{Hn)= [ K{h-^{x-y)){f,{y)-f,[x))v{y)dy 

JR 

= hn K{y){f,{x - Ky) - f,{x))v{x - hny)dy 

Js. 

= y K{y)(^{{vf,){x - Ky) - {vf,){x)) - {vix - hny) - y{x))f,{x)^dy. 

We now use that both (u/J and have derivatives up to order [/3J. Also remind that K is of 
order uq > (3. By a Taylor expansion, for some and S (0,1), 

(35) v[Hn) = K j^K{y){ ^ (a; _ _ *■ {x - ■&'hny)fjx)^dy 

^ {x-dKy) - (u/JLA (a;)) 

- (uL^J(a; - ’d'hny) - v'^^^{x))f,{x)^dy. 
Thus, using (u/J G 'Hp(L") and v G for some L" > 0, 

HH^)\ <hnj^ |if(2/)|^y||y^((L"|^/iny|^''>) + {L"\d'Ky\^P^)f,ix))dy < hl+P. 

Hence, recalling (34), N[ln{x) < and ^J\Tn\hn En {x) goes to zero when n goes to infinity 
choosing oc |T„|““ with a > 1/(1 -I- 2/3). 

Step 1.2. In the same way we have proved |u(7J„)| < , we prove \QHn{y)\ < using the 

fact that z fL{z)Q{y, z) and z Q{y, z) belong to 'H^(L") for some other L" > 0 for any fixed 
?/ G K. This enables us to reinforce the inequality of Lemma 15(i) and using Lemma 15(ii) we 
obtain 

\Q^H^{y) - u(iJ„)| < hl+f^ A (V(2/)p'), / > 1. 

It brings the following upper-bound using the same technique as in Step 1 and Step 2 of Proposi¬ 
tion 14: 

^^[{N[^{x)f]<hi{\T^\hr:)-\ 

which, by the Borel-Cantelli lemma, leads to the P^- almost sure convergence of ^|T„|h„W^.ri {x) 
to zero, choosing oc |T„|““ with a > 0. 
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Step 1.3. To end the first step of the proof, we prove that the remainder term is such that 
(36) \/\^n\hnRL,n{x) 0, - O.S. 

as n —>■ oo. Write 


(37) Ri,n{,x) = {vn{x) - n7„)/i(a;)l{p„(a;)<^„} 

where (pn{x) — ■Wn) converges P^-almost surely to I'lx). We can easily prove that l{j?„(x)<ro„} 
converges P^-almost surely to 0, since /7„(x) converges P;_j-almost surely to h'{x) and Wn —>■ 0, 
which means that is null P^-almost surely beyond some integer. So 

'\/\'^n\f^n'^{V„(x)<-cn.n,} ~ Oj Q.S. 

beyond some integer and (36) is thus proved. 

Step 2. Main term M^^nix). We will make use of the central limit theorem for martingale triangular 
arrays (see for instance Duflo [25], Theorem 2.1.9, p 46). We follow Delmas and Marsalle [17] 
(section 4) in order to define the notion of the n first individuals of T. Let (n^)m>i be independent 
random variables, where for each m, is uniformly distributed over the set of permutations 
of Gm- The collection (n);j(l), ..., n);j(lGr„|)) is a random drawing without replacement of all 
the elements of Gm- For fc > 1, set pk = inf{fc' >Q,k< jT^/j} (it can be seen as the number of 
generation to which belongs the fc-th element of T). We finally define a random order on T through 
n the function from {1, 2,. ..} to T such that 11(1) = 0 and for k > 2, n(fc) = n*^(fc — jTp^_ij). 
We introduce the filtration Q = (Qn, n > 0) defined by Gq = cr{X^) and for each n > 1, 


Gn — ((^n(fe)’ ^(n(fc),o)’ "^(n(fc),i))’ 1 ^ ^ ^ (n(^)) l<A:<n+l)j. 


_(77 ) _(tT’) 

For n> 1, we consider the vector of bivariate random variables E (x) = {Ej. (x), 1 < fc < JT^j) 
defined by 


4 ”)(x) with eI^\x) = (lT„lh„)-'/' (^(h^(x - 

_(77 ) _(tT') 

Notice that E (x) is a square-integrable martingale adapted to ^ = (Gn)n>o- Then, (E (x), 

n > 1) is a square-integrable ^-martingale triangular array whose bracket is given by 


(38) Et\x) = Y, 

Z =1 





(F;^”^(x))fe = ^E f;}”)(x)( x)) 


Gi-i 


|T„lh„ 


k 

1=1 



where F is the noise covariance matrix. We apply Proposition 16 (with replacing K as 

kernel function) and we obtain 


(39) 


(f;‘”^(x))|t,,i = ( 


1 

\Tn\h„ 


uGTn 


K^{h-\x-Xx,)))T^\K\li,{x)r, Pp-a.s. 


as n —>■ 00 . Condition (Al) of Theorem 2.1.9 of [25] is satisfied, this is exactly (39). Since the 
bivariate random variables [{euo,Sui),u G T) are independent and identically distributed and since 
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eg and Si have finite moment of order four, 


IT. I 






fc=l 


Gk-i 


E[(£o+ gf)^] 4 / 1 , .s 

(|T„|h„)2 

E[(£o + ^ 1 )^ 




(jfV E A-uuu-v 




uGT„ 


where || • || denotes the Euclidian norm for vectors and setting Eq (x) = 0. We apply Proposition 16 
(with replacing K as kernel function) and we conclude that 

T„ 

iiTfI")/ 


(40) 




^E HEi^>(x)-Ei’^J,(x 


Gk-i 


0 , - a.s. 


The Lyapunov condition (40) implies the Lindeberg condition (A2) of Theorem 2.1.9 of [25] (see 
section 2.1.4, p 47 of [25]). Therefore, by the central limit theorem for martingale triangular arrays, 

e[£|(t) = AA4(02,lKl2u(x)r). 


^Mi,„(a;)^ 

We conclude gathering Step 1 and Step 2, together with 


Vn{x) \J Wn^ v{x), 


— a.s. 


and the Slutsky lemma. 

Step 3. Independence. Let xi ^ X 2 G H. We repete Step 2 for 


E^^\xi,X 2 ) = ^^eI'^\xi,X 2 ) with ^ 2 ) = 


and we are led to 


E[Tj(a;i,a;2) = y'lTnlK 


/Mo,„(a;i)\ 


1T„| 


\r (n I IV|2^ ^ 2,2 

Af4(04,lXl2(^ O2.2 K^2)r 


A^i.n(a;i) 

Mo^ri{x 2 ) 

\Mi^n{x2)J 

with O 4 and 02^2 respectively the null vector of size 4 and the null matrix of size 2x2, which 
shows asymptotic independence between {Mo^n{xi),Mi^n{xi)) and {Mo^n{x 2 ), Mi^n{x 2 )) and thus 
between (/o,n(a;i),/i,„(a:i)) and {fo,n{x 2 ), fi,n{x 2 )) as asserted. □ 

Proof of Theorem 9. Step 1. Case k < 00 . We look carefully at the proof of Theorem 8 and see 
that only Step 1.1 has to be reconsidered. We prove that 

—> Kv{x)m{x). 

Indeed for f3 an integer, using (34) and (35), 

y/\Tr,\hnNj;^2{x) = ^ f y/3^(y) _ §hnV) - {x - 'O'hnV) f,{x)) dy 

P- jR 

and we conclude letting n go to infinity since {vfu)^ and are continuous. 

Step 2. Case k = 00 . With the same argument we prove that in that case. 
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as n —)■ oo. Looking at Steps 1.2 and 1.3 in the proof of Theorem 8, we obtain 
h~^ {x) —0 and h~^ —> 0, — a.s. 

In addition, 

since we work in the case ^ k = oo and y^\Tn\hn Mi,^n{x) is asymptotically Gaussian 

as proved previously (Step 3 in the proof of Theorem 8). □ 

Remark 19. ///? is not an integer, we could generalize Theorem 9 but at the cost of introducing 
fractional derivatives. Note that the definition of this notion is not unique (see [41] or [43]j. We 
restrict the parameter (3 to be an integer in order to avoid here additional technicalities. 

6.7. Proofs of Corollary 10 and Theorem 11. 

Proof of Corollary 10. On the one hand, applying Theorem 9 to the estimator built with the 
bandwidth oc 

\Tn\^An{x) J\f 2 {m 2 {x) ,'E 2 {x)) with (x) = [ 

\fl,n\^) fi\^) J 

On the other hand, applying Theorem 9 to the estimator fj:^n built with the bandwidth hn'^ oc 

|T„|-^/(2/5+i) for Je (0,1), 

|T„|5^S„(x) ^ m 2 (x) with Bnix) = 

Combining these two results, we obtain 


IT IWT (kni^) - fo{x) 
V/i,„(x)-/i(x) 


as announced. 


|T„|=^/5+M„(x) - |T„| 2/5+1 B„(x) 
{ 1 - 6)0 
1-|T„| 2/3 + 1 


3^2(02 , S 2 ( x )), 


□ 


Proof of Theorem 11. The study of the test statistics lT„(xi,..., Xk) under TLo and "Hi then follows 
classical lines. We give here the main argument for k = 2. By 10 and using in addition the 
asymptotical independence stated in Theorem 8, we obtain 

ffpA^i) - fo{xi)\ 
fi,n{xi) - fi{xi) 
fo,n{x2) - fo{x2) 

\flAx2) - fl{x2)J 

Then, using the Delta-method, 


|T, 


• 7 V' 4 ( 04 , S4(xi,X2)) with S4(xi,X2)= ^ ^ 2(3 


2,2 

(a/2) 




with 


S2(xi, X2) = {KAA +cr(- 2 ao(Tig) 


Ai)y 


{y{x2)Y 
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which boils down to, under "Ho, 

_ - /i,n(a;i))\ d ^ , N 

\/\K\2{(Tq + af — 2aoaig) \>^{x2Y^^{fo,n{x2) — fi,n{x2))J 

with I 2 the identity matrix of size 2x2. The replacement of u(-) by its estimator u„(-) is licit 

by the Stlutsky theorem. Thus, under 'Ho: Wn{xi,X 2 ) —^ X^(2), the chi-squared distribution 
with 2 degrees of freedom. Under "Hi, we prove that Wnixi.x^) converges P^-almost surely to -boo 
following the same lines and using 

2 2 

^ ^ u[xi){fQ{xi) - fl{xi))^ ^0, P/, - a.5. 

1^1 1^1 


when "Hi is valid. 


□ 


7. Appendix 

7.1. Proof of Lemma 12. Let us first prove (i), see Delmas and Marsalle [17] (Lemma 2.1). For 
u = (ui,M2, • ■ ■,Um) e Gm, 


= ^J'{'Pul'Pu2 ■ ■-Vu^ig))- 

Then 

E4^5(A„)]= ^ f^{VuXu2.-.VuAg))=tx(^ 'PuXu2---VuA9)) 

UGGrn (ii 1 , • - -, ) 

= m( 2 ™(Po + Virig)) = IGmHQ^g) 

since Q = {Vo + ’Pi)/2 and |Gm| = 2™. We also know that fi{Q'^g) = ^ii[g{Ym)\- 

We now turn to (ii). We refer to Guyon [27] for another strategy of proof (proof of Equa¬ 
tion (7)). Some notation first: for u = {ui,... ,Um) and v = (ui,...,p„) in T, we write uv = 
(mi, ..., Urm Pi, • • ■, Vn) for the concatenation. For m > 0, we denote by Fm the sigma-field gener¬ 
ated by (ATu, |m| < m). 

For m > 1, whenever u v G there exist w G T,^_i together with i ^ j G {0,1} and 
u, V such that u = wiu and v = wjv, where we call w the most recent common ancestor of u and 
V. The main argument uses consecutively a first conditioning by which lets A„ and Xy 

conditionally independent and a conditional many-to-one formula of kind (i), a second conditioning 
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by ^\w\ s-nd the definition of the T-transition "P and finally the many-to-one formula (i), 


E„ 


^ g{X^)g{xA=2Y, E ^4 E 5 5 1 {j) ] 


U^V^Qri 


W^Gm — l 
m 

= 2E E 

1 — 1 W^Gm — l 


uGGi_i 

veGi_i 




&-'gix^o)&-'9{x^i) 


= 2^(2'-1)2 ^ E^[iP(Q'-i<?® 

W^Gm-l 


/ = 1 


= 2j2{2'~")^\G„,-i\E^[v{Q^-^g Q^-^g){Y^-i) 

1^1 
m 


1=1 

as asserted. 

7.2. Proof of Lemma 15. First, 

\QHn{y)\< f \F{h-\x-z))\\G{z)\Qiy,z)dz 

jR 

= K f \F{z)\\G{x - hnz)\Q{y,x - hnz)dz < hn\G\-D\Q\m.,v\F\i 

4 supp(F) 

for hn such that x — /i„supp(P) C V (remind that x belongs to the interior of V). We prove in 
the same way the bound on v{F[n). Hence we have proved (i) and we now turn to (ii). The first 
bound hn obviously comes from (i) and it remains to prove the second bound Y{y)p'^. Under 
Assumption 2, we apply Proposition 13 to 5 = F7n/|f?n|oo and it brings 

IQ’^Hniy) - v(H„)| < R\HnUp"^Y{y). 

Since |i7n|oo = |F’|oo|G'|d for /i„ such that x — /i„supp(F) C T), we obtain the announced upper- 
bound. 

7.3. Proof of Lemma 17. For every kl < d, 

v{z)= / v{y)Q{y,z)dy> inf Q{y,z) / v{y)dy. 


\^\<d 


On the one hand, 


2 (a:>y) > 2 ! ,, 44 . Gi(y-/i(x))| > 5{d+{e+yM2)) > S^Ms) > 0 

BT<d ’ |d<<i ’ \y~\<d ' 

if d > 0 is such that d+ {i + 7 M 2 ) < A/ 3 , which is possible by Assumption 3. On the other hand. 


' |y|>M2 


y[y)dy = 




v{x)Q{x,y)dxdy 


< M. 


'|2/|>M2 Jx& 


;(Go{y - foix)) +Gi{y- fi{x))^dxdy < 7 (^/ 2 ) 
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since (/o,/i) belongs to 7 ^( 7 ,and (GojGi) to Q{r,X)‘^. We know in addition that |u|oo < 
IQIm.r < (|Go|oo + |Gi|oo )/2 < 00 . Using ri{M 2 ) < 1 brings 

/ v{y)dy > 0 . 

J\y\<M2 

Hence u(j/) > (5(M3)(l — ri{M 2 )) > 0 for any |?/| < d. We have uniformity in /o and /i since 5{MX) 
and r]{M 2 ) are uniform bounds on the class 7 ^( 7 , t')^. 
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